ARABASE: A Relational Database for Arabic OCR Systems

نویسندگان

  • Najoua Essoukri Ben Amara
  • Omar Mazhoud
  • Noura Bouzrara
  • Noureddine Ellouze
چکیده

In this paper we present a database for the research of Arabic off-line and on-line handwriting optical recognition as well as for machine printed text optical recognition. Digital images of documents, text phrases, words/sub-words, isolated characters, digits, signatures, soon are and included in ARABASE. Data corresponds to a variety of lexes (cities names, literal amounts, isolated characters, digits, free texts, etc.). The database organization offers interesting commodities to be explored via an Arabic writing recognition system. A useful tool enables the user, via a graphical interface to experiment different classical tasks of image processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Management of OCR Data using an RDBMS

The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-of-the-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to store the resulting ASCII text in a relational database. The OCR problem is challenging, and so t...

متن کامل

A Database for Arabic Printed Character Recognition

Electronic Document Management (EDM) technology is being widely adopted as it makes for the efficient routing and retrieval of documents. Optical Character Recognition (OCR) is an important front end for such technology. Excellent OCR now exists for Latin based languages, but there are few systems that read Arabic, which limits the penetration of EDM into Arabicspeaking countries. In developing...

متن کامل

A Survey of Robust hybrid approach for Arabic character recognition

In this paper we present a system of Arabic characters recognition dedicated to the automatic reading of ACR (Arabic Character Recognition). The developed system is a Fuzzy classifier: Fuzzy Logic (FL) combined with the Expert System (ES) to extract the topological and the contextual informations of each Print character. This combination is very useful to improve the powerful of Hybrid Intellig...

متن کامل

A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research

This paper presents a new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research. The database is freely available for academic use. So far no such a freely database in Farsi language is available. Grayscale images of 52,380 characters and 17,740 numerals are included. Each image was scanned from Iranian scho...

متن کامل

Arabase - A Database Combining Different Arabic Resources with Lexical and Semantic Information

Language resources are important factor in any NLP application. However, the language resource support for Arabic is poor because the existing Arabic language resources are either scattered, inconsistent or even incomplete. In this paper we discuss the notion of having an integrated Arabic resource leveraging various pre-existing ones. We present a comparison between these resources then we pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. Arab J. Inf. Technol.

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2005